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PACS 89 . 70 . Cf - Entropy and other measures of information 
PACS 89 . 75 . -k - Complex systems 

PACS 89 . 65 . Gh - Economics; econophysics, financial markets, business and management 

Abstract. - We discuss the connection between information and copula theories by showing that 
a copula can be employed to decompose the information content of a multivariate distribution 
into marginal and dependence components, with the latter quantified by the mutual information. 
We define the information excess as a measure of deviation from a maximum entropy distribution. 
The idea of marginal invariant dependence measures is also discussed and used to show that 
empirical linear correlation underestimates the amplitude of the actual correlation in the case 
of non-Gaussian marginals. The mutual information is shown to provide an upper bound for 
the asymptotic empirical log-likelihood of a copula. An analytical expression for the information 
excess of T-copulas is provided, allowing for simple model identification within this family. We 
illustrate the framework in a financial data set. 
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Introduction. — Modeling statistical dependence has 
a pervasive role in science. Information theory provides 
a unifying framework for ideas from areas as diverse 
as differential geometry [1], physics [2-4], statistics and 
telecommunications [5]. From the information theoretic 
point of view dependence can be quantified by measur- 
ing the distance between a given model denned by a joint 
probability density 0(x) and a mean field model defined 
by (f> = rij=i fj( x j)' wnere fj( x j) are marginal densities 
fj( x j) = f Tlk^j d x k 0( x ) [6]- The relative entropy given 

by 



f N ( <t>\ 

■o] = / [J dx 3 0(x) log — — 
J j=i \llj=i 



x) 



(i) 



defines a premetric in the space of distributions that can 
be employed to quantify the degree of dependence in a 
model, this particular measure is also known as the total 
correlation or, in the bivariate case, as the mutual infor- 
mation. 

The copula theory has been proposed in statistics as an 
approach for modeling general dependences in multivari- 
ate data. A theorem due to Sklar [7] assures that, under 
very general conditions, for any joint cumulative distri- 
bution function (CDF) F(x) = Iljli I-oo ^ x i <H X ) there 



is a function C(u) (known as the copula function) such 
that the joint CDF can be written as a function of the 
marginal CDFs in the form F(x) = C[F\{xi), ■ ■ ■ F n (xn)]- 
The converse is also true: this function couples any set of 
marginal CDFs to form a multivariate CDF. This provides 
a convenient picture of the marginals as being responsible 
for the idiossincratic properties of each variable and the 
copula function as a description of the dependence be- 
tween them. 

A complete articulation of these two concepts is, how- 
ever, curiously absent in the literature. In this short con- 
tribution we seek to survey the basic ideas connecting 
these two threads emphasizing the information theoretic 
interpretation. 

We have organized this letter as follows. In the next sec- 
tion we briefly discuss the idea of measures of dependence 
that are marginal invariant. We then connect copula the- 
ory with mutual information by introducing the concept 
of copula information and present an analytical prescrip- 
tion to identify a model for bivariate non-Gaussian de- 
pendences within the T-copula family by estimating the 
mutual information. We briefly comment on general con- 
sequences and perspectives in a final section. 
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Mutual information and copulas. From this 
point on we restrict our discussion to bivariate distribu- 
tions, the multivariate case follows after straightforward 
adaptations. 

Two random variables X and Y are said to be statis- 
tically dependent if, and only if, their joint probability 
density function (PDF) cannot be written as a product 
of marginal PDFs, that is, if <f>(x,y) ^ f x (x)f y (y), where 
f x (x) and f y (y) are marginal densities. A convenient way 
to quantify statistical dependencies is by evaluating the 
mutual information defined by: 



I(X,Y) = J dxdy<j>(x,y) log 



4>{x,y) 
fx{x)f y {y) 



(2) 



This quantity is a premetric, to say, it is positive and 
only vanishes in the case of independent variables. By 
defining the entropy of the distribution of X as S[f x ] = 
J dx f x {x) log f x (x) and the average conditional entropy as 
S[f x \ y ] = Jdyf y (y) f dxf x \y{x) log f x \ y (x), where f x \ y (x) 
denotes the conditional probability of X given Y, the iden- 
tity 

I(X,Y) = S[fx]-S[f xly ] (3) 

provides an interpretation for the mutual information as 
the average reduction in the uncertainty in X given knowl- 
edge of Y. Alternatively, the mutual information can be 
regarded as a distance to statistical independence in the 
space of distributions measured by the relative entropy 
between the actual joint distribution and the product of 
marginals I(X,Y) = S[<f) || f x f y ]. 

Sklar's theorem asserts that there exists a copula func- 
tion such that the joint CDF can be written as F(x,y) = 
C[F x (x), F y (y)]. We may also regard a copula function 
as the joint CDF of two uniformly distributed variables u 
and v, both in the [0, 1] interval. Such a pair (u, v) can 
always be found from any pair of random variables with 
the substitution u = F x (x) and v — F y (y) . 

To exemplify we can build a joint standard Gaussian 
with correlation p by plugging Gaussian marginal distri- 



butions = f* 
defined as: 



du 



e 2 M into the Gaussian copula 



(4) 



e 2 <!-p 2 ) 



^4^(1-^) 

Clearly X and Y are dependent if, and only if, 
C[«,v] ^ uv. Introducing the copula density as c[u, v] = 
7^0^C[u,v], we can decompose the joint probability den- 
sity as 



<f>(x,y) = c[F x (x),F y (y)]f x (x)fy(y) 



(5) 



and observe that statistical dependence would simply im- 
ply that c[u, v] ^ 1. 

Marginal invariant measures. — Two close con- 
cepts in statistics are dependence and concordance. While 



dependence relates only to the functional relationship be- 
tween two variables, concordance measures whether posi- 
tive or negative comovement of variables is present. Mea- 
sures of dependence and concordance are plenty. How- 
ever, a good dependence (resp., concordance) measure 
should [7,8]: 

1. be invariant under reparametrizations: (x, y) — > 
(q(x),w(y)), if q(x) and w(y) are monotonous func- 
tions (changing sign if one of the reparametrizations 
is a monotonically decreasing function, in the case of 
concordance measures), 

2. have a unique minimum (a unique zero, in the case of 
concordance), that can be set to zero with no loss of 
generality, at <t>{x,y) = f x {x)f y (y). 

Some authors would also require that a measure of de- 
pendence (concordance) should be restricted to the [0, 1] 
([— 1, 1]) interval. We do not require it here since any real 
number can be trivially mapped into any interval. Good 
measures of concordance on the other hand must have a 
unique zero if X and Y are statistically independent, be 
invariant under monotonically increasing reparametriza- 
tions and change sign if one of the functions of the 
reparametrization is monotonically decreasing. 

With the concept of copula density at hand, these 
desiderata can be concisely restated as: a measure of de- 
pendence must be a functional of the copula density alone 
(i.e. must be independent of marginal densities), with a 
unique minimum at c[u, v] = 1. 

The linear correlation for standardized variables 
p(X, Y) — J dxdy xy (f>(x, y) is widely used as a measure 
of concordance and its absolute value as a measure of de- 
pendence. The correlation may be rewritten in terms of 
copula densities as: 



p(X,Y)= ( dudvc[u,v]F- 1 (u)F- 1 (v) 

J[0,1] 2 



(6) 



If X and Y are independent, c[u, v] = 1 and consequently 
p(X, Y) = 0. However, it is clear that a copula may 
be chosen such that the linear correlation vanishes even 
though c[u, v] ^ 1. Moreover, p(X,Y) is obviously depen- 
dent on marginal distributions. 

A better alternative for measuring concordance would 
be the rank correlation, also known as Spearman's p de- 
fined as 

P ran k(^, Y) = 12 / dudv c[u, v] uv — 3. (7) 



This measure strictly fulfills concordance measures 
desiderata. For a Gaussian bivariate distribution, the rank 
correlation is related to the correlation parameter as: 



Prank = ^ sin ^f) 



(8) 



Where p is the correlation parameter of the Gaussian cop- 
ula, which is identical to the usual linear correlation only 
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if the marginals are also Gaussian. Another measure of 
dependence that is marginal independent is Kendall's tau 
defined as 



t[X,Y) 



[0,11 



dC[u,v]C[u,v] - 1. 



(9) 



In the case of meta-elliptical distributions [9], that in- 
cludes Gaussian and T copulas, Kendall's tau is also re- 
lated to the the correlation parameter as: 



(10) 



In the next section we show that the mutual information 
also fulfils good dependency measures desiderata, since it 
is always non-negative, it only vanishes for independent 
variables and it is a functional of the copula density alone. 

Copula Information. — Mutual information and 
copula densities can be connected by plugging eq. (|5| into 
eq. ([2]) , and by performing the simple change of variables 
u = F x (x) and v = F y (y), to conclude that: 



I(X,Y)= I dudv c[u,v] log (c[u, v]) = -S[c} 7 (11 
J[a,i] 2 



where S[c] is the differential entropy associated with the 
c[u,i>] distribution, which we will (following [10]) conve- 
niently name the copula entropy. Notice that S[c] < 0, 
as can be shown by considering eq. (|5| together with 
Jensen's inequality, since — log(x) is a convex function. 
This simple result shows that mutual information is in- 
variant under arbitrary choices of marginal densities f x (x) 
and f y (y) . It is also implied by this connection that using 
a maximum entropy principle to choose a copula function 
given constraints is analogous to assuming the least infor- 
mative dependence (minimum mutual information) which 
explains the constraints, which is actually a reasonable 
principle [11]. This provides yet another interpretation 
for mutual information: it quantifies the information con- 
tent of the coupling (copula) functional. From the identity 
S[<f>] = S[f x ] + S[f v ] - I{X, Y) and eq. p), we have: 



S[<j>] = S[f x ]+S[f y ]+S[c]. 



(12) 



In words: the total information content can be uniquely 
decomposed into the information content in each variable 
plus the information content on the dependence between 
them. 

Information excess. — When quantifying depen- 
dence, it is a common practice to start by measuring linear 
correlation. In the language we have introduced that is 
analogous to assuming a Gaussian copula described by a 
single parameter p. However the notion that this parame- 
ter can be measured by the usual linear correlation relies 
upon the additional assumption that marginals are also 
Gaussian, as the linear correlation is a measure that also 
depends on marginals. This particular copula is a very 



special case as it assumes that the information contained 
in the dependence between variables is minimal given p. 
This minimal mutual information content in a Gaussian 
copula is given by [5]: 



7 Gauss(^) 



1 



log(l - p 2 ) 



(13) 



which can also be written as a function of the observ- 
able rank correlation using eq. If this assumption of 
minimal dependence given the parameter p fails, an ex- 
cess of information in the dependence with respect to the 
Gaussian /excess = I(X,Y) — ^GaussO ) ^ s observed. An 
algorithm for efficient estimation of the mutual informa- 
tion I(X,Y) has been proposed in [12] which, together 
with a good estimate for p, provides a diagnostic tool for 
information excess. The observation of excess means that 
the dependence cannot be specified by the linear corre- 
lation alone even after the identification of non-Gaussian 
marginals. 
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Fig. 1: Linear correlation is underestimated in the case of non- 
Gaussian marginals. If both marginals and copula are Gaussian 
the joint distribution can be placed over the lower bound for the 
mutual information. A change in marginals keeping the copula 
fixed, preserves the mutual information, however correlation 
estimates are displaced inwards. 

If marginals are non-Gaussian neither the mutual in- 
formation nor the parameter p are affected, however, the 
linear correlation estimate p(X, Y) consistently underesti- 
mates \p\. That can be seen by considering the I(X,Y) 
versus p plane in which the curve described by eq. (13 1 



represents a lower bound for the mutual information as 
depicted in fig. [I] For a Gaussian copula the parameter 
p is measured by the linear correlation only if marginals 
are also Gaussian, in this case we can locate a particular 
joint probability density over the curve of minimal mutual 
information with a given p. Suppose that marginals are 
changed into non-Gaussian densities. As the copula for 
the variables is unaltered the mutual information is also 
unchanged, however, the linear correlation can change. As 
the curve represents a lower bound for the mutual infor- 
mation given p, it is only possible for the linear correlation 
to change inwards, hence underestimating \p\. In order to 
find p correctly we have first to estimate a measure that is 
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marginal invariant, as the rank correlation given by eq. Q, 
and then employ an inversion relation as eq. (fsl) . 



0.25 
0.2 

0.15 
0.1 

005 


-0.05 



■ 




0.35 04 0.45 0.5 



Fig. 2: Mutual Information estimates following [12] ver- 
sus Kendall's tau for pairs of series of daily log-returns 
(log[P c i ose /P pen], where P c i ose and Popen are, respectively, 
close and open prices) of 150 stocks composing the S&P500 
index over the period from January 2, 1990 to September 16, 
2008 (around 4700 samples per series). Bootstrap error bars 
represent a 90% confidence interval. Note that within this con- 
fidence interval a great number of the pairs display a non-zero 
information excess with respect to the Gaussian copula 

As an applied example, fig. [2] shows estimates for the 
mutual information obtained by the Kraskov-Stogbauer- 
Grassberger (KSG) method Q [12] against Kendall's 
tau for pairs of series with daily log-returns r t = 
log [P c \ ose /Popen] (where P c \ ose and P pen are, respec- 
tively, close and open prices) of 150 stocks composing 
the Standard & Poors 500 index (S&P500) over the 
period from January 2, 1990 to September 16, 2008 
(around 4700 points in each series). The error bars have 
been obtained employing the bootstrap technique [13]. 
The information excess observed can be traced to time- 
varying cross-correlations [14] and to dependences between 
cross-correlations and returns [15] that jointly yield non- 
Gaussian copulas. Here we have used Kendall's tau as a 
marginal invariant measure. In the next section we show 
that this particular marginal invariant plane defined by 
mutual information and Kendall's tau is sufficient to iden- 
tify the best T-copula representing non-Gaussian depen- 
dences shown by the data. 

Copula Identification. — Given a data set 
{(xt,yt)}tLi independently sampled from an unknown 
joint density <f>{x,y), the best approximation <pg{x,y) 
within a manifold T ', parameterized by 9, can be found by 
minimizing a sample estimate of the relative entropy [6] : 



S[tf> 



dxdy (j>{x, y) log 



<t>{x,y) 



By considering eq. ^ and performing appropriate vari 



1 We provide a CH — h library to calculate Mutual Information with 
this method with confidence bands estimated with the bootstrap 
technique [13] in http://code.google.eom/p/libmi/ 



able changes we can write: 
S[</> || fl ] = S[c || c 9 J + S[f x || /*•] + S[f v || /*"], (15) 



which is just the decomposition (12 1 in terms of relative 
entropies. Thus it is reasonably clear that the inference 
procedure can be implemented by independently minimiz- 
ing the relative entropy for empirical marginals and copula 
density. By employing relationship (111 
from the copula in eq. 



the contribution 



151) can be further rewritten as: 



S[c\\co B ] = -L 00 (O c )-I(X,Y)>0, 



(16) 



where L 00 (6' c ) = Jj ^ 2 dudv c[u, v] log (cg a [u, v]) is the 
asymptotic copula log- likelihood. Notice that Jensen's in- 
equality implies that —L OO (0 C ) > I(X,Y) > 0. Conse- 
quently, minimizing S[c \ \ cg c ] is equivalent to maximizing 
the likelihood with the mutual information I(X, Y) as a 
bound. 
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Fig. 3 

by eq. dl9j» (full line) 



'.formation excess. Ift xcess (v) as provided 
Circles show estimates for 20 runs 



with T-copulas with known v for p = 0.1, 0.5, 0.9 and arbitrary 
marginals. Error bars represent one standard deviation. Esti- 
mates have been computed by employing the KSG method. 

The estimation of I(X, Y) can be employed to measure 
the quality of a fit within the chosen family T . In particu- 
lar, suppose we choose a family such that Loo(# c ) is known 
analytically. If we additionally find a family that contains 
a distribution that saturates the bound, we can use an 
efficient estimator for the mutual information as [12] to 
identify the best copula 6 C within T right away. 

In this procedure the identification of the copula is from 
the start disentangled from the choice of marginals. The 
T-copula is an interesting choice as the mutual information 
can be analytically evaluated. The T-copula density is 
defined in two dimensions as: 



see eq. ( 17 1 



(14) with q p {x,y) = 



x +y —2pxy 



1-p 2 



and t v 1 (u) denoting the in- 



verse of the distribution function of the univariate Student 
T density with v degrees of freedom. It can be shown (see 
appendix) that the mutual information of a multivariate 
T-copula can be decomposed as: 



I T (p, v) = I Gauss (p) + /Excess M> 



(18) 
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r(^)r(f) 
[r(^i)] 2 yi^7 



(17) 



where, in two dimensions (2D), ^Q auss (p) is given by 
eq. (13 1. The excess information term only depends on 
the number of degrees of freedom v. In 2D it is given by: 



7 Excess( l/ ) = 21 °S 



+ (1 + f) 



B 



1> 



where B(x,y) is the Beta function defined as 

T(x)T(y) 



B(x,y) 



T(x + y) 



(19) 



(20) 



and tp(x) is the digamma function. Fig. [3] shows the 
T-copula information excess ^Excess as P rov ided by 
eq. ( |19[ ). The parameter p yields the linear correlation 
in the purely Gaussian case (y — ► oo) but must be esti- 
mated through a marginal independent measure of con- 
cordance/dependence in the general case. For T-copulas 
/'rank is a function of both p and v that is not known 
in any simple form. However, in order to identify the 
appropriate T-copula a simpler alternative is to employ 



Kendall's tau that is a function of p given by eq. ( 10 1 . We 



can estimate Kendall's tau and then employ the excess 
of information in relation to a Gaussian copula to find v. 
Fig. [3] shows the result of simulations using data sampled 
from a joint distribution composed by a copula density 
with known parameters and arbitrary marginals. Going 
back to fig. [2] the best copula within the manifold of T- 
copulas can be immediately identified for each point in the 
mutual information versus Kendall's tau plane. 

Conclusions. — The literature on information and 
copula theories has developed in relative isolation. In this 
paper we sought to discuss a couple of consequences yield 
by connections between these two threads. 

Copula theory can be employed for factorizing a gen- 
eral joint distribution into marginal fluctuations and a de- 
pendence core that is not unique. On the other hand, 
a combination of copula and information theories pro- 
vides a unique decomposition in terms of global informa- 
tion content measures. This decomposition yields a sim- 
ple test of Gaussianity through the estimate of the infor- 
mation excess (a procedure that is simpler than e.g. [16] 
or [17]) and also suggests a method for copula identi- 
fication based on information content matching. This 
method displays a simple formal equivalence to the usual 
maximum likelihood methods (e.g. [18]). A C++ li- 
brary for determining Mutual Information from pairs of 



time series with the KSG algorithm and bootstrap con- 
fidence bands produced by the authors is available at 
|http : //code . google . com/p/libmi/| 

This approach also clarifies the danger of using linear 
correlation as a measure of dependence for, e.g., portfo- 
lio optimization or time series analysis as this measure is 
bound to underestimate dependence that would be bet- 
ter captured by easily estimated marginal invariant mea- 
sures [8, 12]. Finally, we think that a unified understanding 
of information and copula theories may be a useful source 
of new fundamental ideas for the analysis of multivariate 
data arising from complex physical phenomena. 
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Appendix: Information Excess for T Copulas . 

In this appendix we present a derivation of the entropy 
and mutual information of Student T distributions. 

The standard Student T distribution in d dimensions is 
given by: 



Pd{t | S, v) 



Zd{^, v) 



1 



t T £- 



(A.l) 



where v is a parameter and £ is the correlation matrix. 
The normalizing pref actor is defined as: 



r(f) 



(A.2) 



r ( x ) r (v) being the Beta function. 



with B(x, y) - T^rp^y 

For d = 2 this simplifies to read: 



p 2 (x,y \p,v) = 



r(l + f) 



-(i+i) 



(A.3) 



withq p (x,y) = x+ y_J pxv . 

The differential entropy of a given set of variables t dis- 
tributed as p(t) is given by: 



S[ Pd ] = -J d n tp d (t)log(p d (t)). 
The mutual information for d dimensions is: 



I(X 1 ,X 2 ,...,X d ) = / p d (x)lo; 



Pd(x) 



(A.4) 



(A.5) 
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For variables with identical marginals (for pi-i(x) — p\(x) 
for all i = 1, 2, . . . , d) this can be written in terms of en- 
tropies as: 

I(X 1 ,X 2 ,...,X d )=dS\p 1 ]-S\p d } (A.6) 
We employ a "replica trick" [19] to write: 
d 



S[pd\ 



lim 



+o dn 



d"t p d {t) 



n+l 



(A.7) 



where the limit n — > can be regarded as a analytical 
continuation for a sequence of integers that is known to 
give sensible results if p d (t) has a unique extremum. This 
calculation is not rigorous but is nicely verified by simula- 
tions depicted in fig. [3] 

We can always simplify this integral by making the 
transformation x = Ut where U is the unitary matrix 
that diagonalizes S. Calling 1 the integral in eq. (A.7), 
we have: 



I 



Z d (Z,v) n+1 



d d 3 



d 

£ 

i=l 



1 



-, -§(n+l)(iH-d) 



(A.8) 

Where is the eigenvalue of £ corresponding to the i-th 
direction. We can also choose variables = \ / | t^—. ) •■'', b> 
write: 



1 = 



2ttt 



v d \n 



r(f) Z d (Y,,v) n+1 Jo 



dr 



„d-l 



1 



f ,2j-3("+ 1 )( !/ + rf ) 
(A.9) 

The above integral is related to the Beta function yielding: 

v d s 



2^5 £ r 

r(|)2Z d (E^)»+i \2 1 ' 2'2 



(A.10) 



Plugging it into eq. ( ]A. 7 1 and using our definition (A. 2 1 
for Z d (Y,,v) gives: 



log 



{-KV) d \t\ 



(A.ll) 



log 



where ?/> (x) is the digamma function. 

The mutual information of the Student d-dimcnsional 



entropy given by eq. ( A.ll ) 
1 



distribution can be calculated using eq. (A.6 1 with the 
entropy ; 

U{% v) 



log | S | 
[B(U)} d ?{j) 

- 1) , fv+V 



(A.12) 



2 v V2 



(i/ + d) , /V + d 
— o — V 



Notice that the only term depending on the correlation 
matrix £ is the mutual information of a Gaussian distri- 
bution iQ auss = — g log I £ I ■ The remaining term is the 
information excess. For d — 2 we have: 



with 



and 



'Excess 



h{p, v) 



-^Gauss 



2 log 
2 + i 



^Gauss + ^Excess > 



log 1 



2tt B [ 2' 2 



+ + 



(A.13) 
(A.14) 

(A.15) 

Hi)" 



where we used that B(x, 1) = - and ^(s + 1) — ip(x) 
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